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Abstract 

We investigate the modeling and the numerical solution of machine learning problems with 
prediction functions which are linear combinations of elements of a possibly infinite-dimensional 
dictionary. We propose a novel flexible composite regularization model, which makes it possible to 
incorporate various priors on the coefficients of the prediction function, including sparsity and hard 
constraints. We show that the estimators obtained by minimizing the regularized empirical risk are 
consistent in a statistical sense, and we design an error-tolerant composite proximal thresholding 
algorithm for computing such estimators. New results on the asymptotic behavior of the proximal 
forward-backward splitting method are derived and exploited to establish the convergence prop¬ 
erties of the proposed algorithm. In particular, our method features a o(1/to) convergence rate in 
objective values. 


1 Introduction 


A central task in data science is to extract information from collected observations. Optimization 
procedures play a central role in the modeling and the numerical solution of data-driven information 
extraction problems. In the present paper, we consider the problem of learning from examples within 
the framework of generalized linear models [5, 19, 21]. The goal is to estimate a functional relation 
/ from an input set X into an output set T’ C M. The data set consists of the observation of a finite 
number of realizations Zn = (®i, yJisjisjn in X x y of independent input/ouput random pairs with 

*The work of P. L. Combettes was supported by the CNRS MASTODONS project under grant 2013MesureHD and by the 
CNRS Imag’in project under grant 2015OPTIMISME. 
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an unknown common distribution P. We adopt a generalized linear model, i.e., we assume that the 
target function / can be approximated by estimators of the form 



( 1 . 1 ) 


where K is at most countable, u = {fJ.k)k£K £ and {4>k)k£K is a family of bounded measurable 

functions from X to M; such a family is called a dictionary, and its elements are called/eatures. The 
estimator ^ is computed via the approximate minimization of the convex regularized empirical 
risk 



( 1 . 2 ) 


where A G M++ and where the convex regularization functions {gk)keK enforce or promote prior 
knowledge on the coefficients {gk)k£K of the decomposition of the target function / with respect to 
the dictionary Our objective is to select a family of regularizers {gk)ke:K that model a broad range 
of prior knowledge and, at the same time, lead to implementable solution algorithms that produce 
consistent estimators as the sample size n becomes arbitrarily large. To satisfy this dual objective, we 
shall focus our attention on the following flexible composite model: each function : M —)• ]— 00 , + 00 ] 
is of the form 


gk = i^Ck + + hk, hfc - ??|-r G r([(M), rG]l,2], ?? G M++ 


(1.3) 


where lc^ is the indicator function of a closed interval Ck C K, udj, is the support function of an 
interval c M, r? G M++, and hfc: M —)• 1R+ is convex and such that /ifc(O) = 0. In this model, 

the role of Ck is to explicitly enforce hard constraints and the role of Dk is to promote sparsity 
[9]. On the other hand, hk provides stability and will be seen to be instrumental in guaranteeing 
consistency. Note that the model (1.2)-(1.3) refines that considered in [9] and that it encompasses 
ridge regression [21, 22], elastic net [16, 34], bridge regression [20], and generalized Gaussian 
models [1]. Proximal thresholders [9], which extend the basic notion of a soft thresholder, will play 
a key role in our analysis. 

The main objective of our paper is to investigate statistical and algorithmic aspects of the estima¬ 
tors based on (1.2)-(1.3). Our main contributions are the following: 

• We prove the consistency of the estimators (/n„ ;,)neN as n — +(X), as well as the convergence 
of the corresponding coefficients (tiri,A)neN in ^'’(IK). This generalizes in particular the analysis 
of [16], which corresponds to the special case when Ck = M, Dk = [— 0 ;^,^^], and hk = r/|.|^. In 
this case, (1.3) reduces to 


= Wfcl • I + r/| • P 


(1.4) 


• We establish new asymptotic properties for an error-tolerant forward-backward splitting algo¬ 
rithm based on proximal thresholders. In particular, we establish new minimizing properties 
and a rate of convergence o(l/m) for the objective function values in the presence of variable 
proximal parameters, relaxations, and computational errors. These results, which are of interest 
in their own right, improve on the state of the art, which considers either the error free-case 
and the non-relaxed version [4, 15], or convergence only in an ergodic sense [28]. 
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The paper is organized as follows. In Section 2, we set the problem formally and present the main 
results concerning the statistical and algorithmic issues pertaining to the proposed estimators. Sec¬ 
tion 3 is devoted to proving the consistency of the estimators, which is established in Theorem 2.4. In 
Section 4, we establish Theorem 2.7, which concerns the asymptotic behavior of a proximal forward- 
backward splitting algorithm, and Theorem 2.11, which specifically deals with the structure consid¬ 
ered in (1.2)-(1.3). Additional properties of the regularizers defined in (1.2) are studied in Appen¬ 
dices A and B. 

Notation. N* = N \ {0}, M+ = [0,+oo[, and R+_|_ = ]0,+cx3[. Throughout, IK is an at most 
countably infinite index set. We denote by {ek)k£K the canonical orthonormal basis of f^(]K). The 
canonical norm of f''(]K) is denoted by H-H,.. Let be a real Hilbert space. We denote by (• | •) 
and II'll the scalar product and the associated norm of H. The set of proper lower semicontin- 
uous convex functions from H to ]—oo,+oo] is denoted by ro(7f), and the subset of To{V.) of 
functions valued in [0,+cx3] by r^(7f). Let ip € ro(7f). The subdifferential of (y9 at u G is 
dip{u) = {tf* G Ti I {\/v G Ti) ip{u) + {v — u \ u*) ^ ‘4(^)} and, for every £ G R++, Argmin|^ ip = 
{ti G I ip{u) ^ inf + e}. Let V <z T-L. The indicator function of V is denoted by lj) and the 
support function of P is ux): —)• ]—oo,+oo] : u i-G sup„gx> I ^)- Let n G Then prox^ri = 
argmin^g-^((/?(?;) + (l/2)||ti — ?;||^) [24]. Suppose that P is a nonempty, closed, and convex subset of 
Ti. Then prox^^ = projx) is the projection operator onto P, and prox^-^, = Id — proJu = softx) is the 
soft-thresholder with respect to P. For background on convex analysis and optimization, see [3]. 


2 Problem setting and main results 

The following assumption will be made in our main results. 

Assumption 2.1 is a measurable space, T’ c R is a nonempty bounded interval, and b = 

saPygy \y\- Moreover, P is a probability measure on X x y with marginal Px on X. The risk is 


R: L\Px)^i^+: 


\f{x) -y\‘^dP{x,y) 


( 2 . 1 ) 




and (0fc)fcgK is a family of measurable functions from A” to R such that, for some k G R++, 



( 2 . 2 ) 


The feature map is 


(2.3) 


and 



(pointwise). 


(2.4) 


In addition. 


(a) {Ck)keK is a family of closed intervals in R such that 0 G flfceK ^k- 
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(b) {Dk)kGK is a family of nonempty closed bounded intervals in M such that XlfceK I ^fc)+r* < 

+00 and ^fc)-r* <+oo. 

(c) {hk)keK is a family in r^(R) such that (V/c G K) hk{0) = 0 and hk — r/|-|^ G for some 

r G ]1, 2] and r] G R++. 

We define 
/ 

(V/c G K) fl'fc = ^Ck ^Dk + iifc 

F = Ro A: i‘^(K) , 

< 9 (2.5) 

G: r(]K) ]-oo,+oo] : u EfcGK5'fc(/rfc) 

C = ^(£2(]K) n X k&K^k) (closure is taken in LP‘{Px))- 

(Xj,Fj)jgN is a sequence of i.i.d. random variables, on an underlying probability space (0,21, P), 
taking values in X x y and distributed according to P. For every n G N*, Zn = {Xi,Yi)i^i^n- The 
function e: M++ —)• [0,1] satisfies e(A) —)• 0 as A —)• 0+. Moreover, for every n G N*, every A G M++, 
and every training set Zn = {xi,yi)i^ii:n e (X x y)^ 

Un,x(zn) G Argmin^^g]2(K) [ “ X] “ ^*1^ + ^G{u) j . (2.6) 


Remark 2.2 

(i) The proposed learning method falls into the class of regularized empirical risk minimization 
algorithms. However, it differs from the classical setting which uses the squared norm as a 
regularizer [13, 18, 19]. 

(ii) The conditions on the sequences ((inf i9fc)+)fcgK and ((sup Zlfc)_)fcgK given in Assumption 2.1 
ensure that G G ro(f^(]K)). Moreover, domG C f^(]K) and G is bounded from below and 
coercive (see Lemma A.l). 

(hi) It follows from (2.2) that the linear operator A is well defined and continuous with respect to 
the topology of the pointwise convergence on that ran A c L°^{Px), and that A: f^(K) —> 
L‘^{Px) is a bounded linear operator such that || A|| ^ k. The feature map <1> and A are connected 
via the identities 

(Vfc G K)(Vx G A) {^{x)\ek) = {Aek){x). (2.7) 

In [16, Proposition 3] it is shown that ran A can be endowed with a reproducing kernel Hilbert 
space structure for which A becomes a partial isometry, and the corresponding reproducing 
kernel is 

K: XxX ^R: {x,x') ^ ^4(x)4(a;'). (2.8) 

fceK 

In the above setting, the goal is to minimize the risk R of (2.1) on the closed convex subset C of 
L?‘{Px) using the n i.i.d. observations Zn = {Xi,Yi)i^i^n- In this respect, recall that the regression 
function /I is the minimizer of the risk on L?‘{Px) and that 

(V/ G L\Px)) Rif) - inf RiL\Px)) = ||/ - (2.9) 


4 




This means that minimizing R on L‘^{Px) is equivalent to approximating the regression function p. 
In our constrained setting, the solution to the regression problem on C results in a target function fc 
with the following properties. 

Proposition 2.3 Suppose that Assumption 2.1 is in force. Then there exists a unique fc & C such that 
R{fc) = infi?(C). Moreover, the following hold: 

(i) fc is the projection of p onto C in L‘^{Px). 

(ii) (V/ G C) 11/ - fcWP ^ R{f ) - inf i?(C). 

(hi) (V/gC) R{f) - miR{C) ^2[(||/ - fc\P + PmiR{C) - iniR{LpPx))f 

+ miR{LpPx))Y'^\\f - fc\P- 

Proposition 2.3 states that, as in the unconstrained case, minimizing the risk over C is still equiv¬ 
alent to approaching fc in LP‘{Px). It is worth noting that we do not assume that fc = fu for 
some u G dom G, since the infimum of R on A(dom G) may not be attained. A consistent learn¬ 
ing scheme generates a random variable Un,x„{Zn)> taking values in f^(]K), from n i.i.d. observations 
Zn = SO that the resulting sequence of random functions {fPnm = (^h„,A„(^n))ngN is 

weakly consistent in the sense that 

^(/n) —^ inf i?(C) in probability ||/„ —/c| 1 ^, 2 0 in probability, (2.10) 

or strongly consistent in the sense that 

R{fn)^miR{C) P-a.s. 4^ ||/„ _^ 0 P-a.s., (2.11) 

depending on the assumption on the regularization parameters (A„)ngN. 

Next, we first state our consistency result and then present an algorithm to compute the proposed 
estimators. 

Theorem 2.4 Suppose that Assumption 2.1 is in force and let fc be defined as in Proposition 2.3. Let 
(An)neN be a sequence in ]0, +(X)[ converging to 0 and, for every n G N, let fn = Aun^xPZPj. Then the 
following hold: 

(i) Suppose that e(Xn)/Xn^ —> 0 and that l/(An^^n^/^) —> 0. Then (/n)neN is weakly consistent, i.e., 
Il/n - /c|Il 2 ^ 0 in probability. 

(ii) Suppose that s{Xn) = 0{l/n) and that {\ogn)/ —> 0. Then (/n)neN is strongly consis¬ 
tent, i.e., Il/n - /c|Il 2 ^ 0 P-a.s. 

(hi) Suppose that fc G A(domG) and set S = Argminj^j^ F. Then there exists a unique u'^ G S 
which minimizes G over S and Ap = fc. Moreover, the following hold: 

(a) Suppose that e{Xn)/X^ -G 0 and that l/(A„n^/^) —)• 0. Then 

\\un,xPZn) - Pp -^0 in probability. (2.12) 
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(2.13) 


(b) Suppose that e{Xn) = 0{l/n) and that (logn)/(Ann^/^) —)• 0. Then 
||^n,ATO('^n) ^ b P-Q.S. 


Remark 2.5 

(i) In Theorem 2.4(i)-(ii) the weakest conditions on the regularization parameters occur 

when r = 2. In the case considered in (hi), the consistency conditions do not depend on the 
exponent r. 

(ii) In the special case when, in (1.3), for every k hk = Ck = M., Dk = [—uik,uik], for some 
Wfc E 1R++, we recover the elastic net framework of [16] and the same consistency conditions as 
in [16, Theorem 2 and Theorem 3]. This special case yields a strongly convex problem. In our 
general setting, the exponent r may take any value in ] 1,2] and the objective function is only 
totally convex on bounded sets (see Lemma 3.1). Note also that our framework allows for the 
enforcement of hard constraints. 

(hi) Under the hypotheses of (hi), the consistency extends to the sequence of coefficients 
{un,\„{Zn))nm- This is relevant when one requires the estimators to mimick the properties 
of uh 

(iv) When K is finite and, for every k ^ K, pk = , [23] provides an excess risk bound depending 

on the cardinality of K and the level of sparsity of (see also [20]). The case r = 1 has been 
considered in [14]. Appendix B collects useful properties of the proximity operators of power 
functions. 


We now address the algorithmic aspects. The objective function in (2.6) consists of a smooth 
(quadratic) data fitting term and a separable nondifferentiable term, penalizing each dictionary co¬ 
efficient individually. Thus a natural choice is to consider the forward-backward splitting algorithm 
[12]. We stress that, since e-minimizers are employed in (2.6), algorithms that provide minimizing 
sequences are necessary. However, when convergence in objective function values is in order, the 
current theory is not completely satisfying. Indeed, the available results consider only the error free- 
case and the unrelaxed version [4, 15]. In [28], errors are considered, but only ergodic convergence 
is proved. In the Theorem 2.7 below, we fill this gap by proving an o(l/m) rate of convergence in 
objective values with relaxation and in the presence of the following type of errors. 


Definition 2.6 Let be a real Hilbert space, let p E ro(7f), let {u,w) E TL^, and let 5 E K+. The 
notation u prox^ w means that 


pin) H—lltt — w\\ij ^ min 
2 ^ v&H 



(2.14) 


Theorem 2.7 Let TLbea real Hilbert space, let F: TL ^ R be a convex function which is differentiable on 
TL with a (]-Lipschitz continuous gradient for some (3 EM++. Let G E ro(7f), set J = F + G, and suppose 
that Argmin J / 0. Let (7m)meN be a sequence in R++ such that 0 < infmeNTm ^ sup^gpj7m < 2//3, 
let (rm)mGN be a sequence in ]0,1], such that infmeN Tm > 0. Let ((5m)meN be a summable sequence in 
M+ and let (6m)mGN be a summable sequence in TL. Fix uq E and set 

for m = 0 , 1, ... 

Vm ^5m ('“m “ 7m(VF('Um) + ^m)) (2.15) 

'Um+l — 'Um T TmiUm ^m)- 
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Then the following hold: 


(i) (wm)mGN converges weakly to a point in Argmin J. 

(ii) For every u E Argmin J, XlmgN < +oo- 

II 112 

(iii) EmeNlI ^7Ti|| ^ “hOO. 

(iv) J{um) inf JCH) and EmeN - inf < +oo. 

(v) Suppose that EmeN(l “ An) < +oo. Then 

{J{vm) — inf JiTL)) < +00 and [j{um) — inf J{Tf)) < +oo. 

m£N meN 

(vi) Suppose that EmeN(l “ < +oo, YlmeN^^rn < + 00 , and EmGN"^ll^mll < +oo- '^^en 

Jium) — inf JiTL) = o(l/m). 

Remark 2.8 In [4], the rate 0(l/m) for objective values is proved in the error-free case and no 
relaxations (dm = 0 and = 1), assuming that F + G is coercive. On the other hand, an o(l/m) 
rate on the objective values was derived in [15] in the special case of fixed proximal parameter 
7 e ]0,2//3[, no relaxation, and no errors. 

We now propose the following inexact forward-backward algorithm to solve problem 1.2. 

Algorithm 2.9 Let (7m)mGN be a sequence in M++ such that 0 < infmeNTm ^ sup^gj^ 7 m < X/k^, let 
(T-m)mGN be a sequence in ]0,1] such that inf^GNT-m > 0. Let = ii/3m,k)keK)mm G 

besuchthatX^mGNll^mll < +oo, letC E M++, letp E ]1,-foo[, and let (^fc)fcgK E f^(]K). Fix (/ro,fc)fcGK G 
f^(]K) and iterate 

for m = 0,1,... 
for every A: E IK 

\m,k hm,k ^ ~ ^ ^ ^ ^ ^ h'm,j fj ) Vi^ fk ) “f Pm,k j 

X j=i / 

I I ^ _ Cm~‘^^^k _ 

^ 47mmax{/ifc(|xm^fc|-h 2),/ifc(-|xm,fc| - 2))} + 2|xm,fc| + 1 (2.16) 

T^m^k = ( Soft-),„,£)j. Xm,k) + (^m,k 

— proJcfe (sign(xm,fc) max {O, sign{xm,k)Trm,k}) 


hm+l,k — “f Tmiyray hm,kf 


An attractive feature of Algorithm 2.9 is that, at each iteration, each component of the functions 
in (2.5) is activated componentwise and individually. 

Remark 2.10 Nesterov-like [25] variants of the forward-backward splitting algorithm may also be 
suitable for computing the estimators (2.6) to the extent that they also generate minimizing sequences 
[28, 30]. 
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Theorem 2.11 Suppose that Assumption 2.1 is in force. Call 

1 ” 

J : f(K) ]-oo, +CX)] : u = {pk)k&£ “ X] X] dkiiJ^k) (2.17) 

i=l /cEK 

the objective fUTlCtiOTl in (2.6)^ ond let (Wm)mGN — ((Mm,/c)/cGK)mEN tind — ((^m,fc)fc€K)mGN be 

the sequences generated by Algorithm 2.9. Then the following hold: 

(i) J has a unique minimizer u, and u G £'’(]K). 

(ii) J2meN - inf J(7^)p < +oo, J{um) inf \\vm - ^11^ -s- 0, and \\um - ^11^ -)■ 0 as 

m —)• + 00 . Moreover 

\\vm - a||r = o{yJ{vm) - inf (2.18) 

and 

\\Um -^llr = 0(^^/j{Um) - VoTJ [TC^ . (2.19) 

(hi) Suppose that X]meN(l “ '’'m) < +oo. Then 

{J{vm) — inf JiTL)) < +00 and {J{um) — inf JiTL)) < +oo. 

mgN meN 


(iv) Suppose thatp > 2, that EmGN(l “ < +oo and EmgN < +oo. Then 

J(um) — inf 2(91) = 0 ( 1 /m) and ||am — a||^ = o(l/-v/m). (2.20) 

Remark 2.12 


(i) In Algorithm 2.9, the computation of prox.^^^^ tolerates an error This is necessary since, 
in general, the proximity operator is not computable explicitly. In such instances, prox.^^^^^ must 
be computed iteratively and the bound on \ am,k\ in Algorithm 2.9 gives an explicit stopping rule 
for the iterations. 


(ii) The soft-thresholding operator with respect to a bounded interval c M is 


(V/i G Dk) softijj^ p. 


p Wk if p LOk 
0 if ^ G 


( 2 . 21 ) 


The freedom in the choice of the intervals {Dk)k&'K, {Ck)k&K, and of the exponent r provides 
flexibility in setting the type of thresholding operation. It is in particular possible to promote se¬ 
lective sparsity. For instance, taking 0 = only the positive coefficients are thresholded. 

Figures 1 and 2 show a few examples. 
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proxg ^ 



Figure 1: Soft thresholding (green) and prox^ for g = 1-1+ 0.9|-|^, with r = 2 (red), r = 3/2 (orange), 
r = 4/3 (blue). 


proxg ^ 



Figure 2: prox^ for g = i]_oo, 6 / 5 ] + o-[o, 2 ] + 

3 Statistical modeling and analysis 


Throughout the section Assumption 2.1 is made. Our main objective is to prove Theorem 2.4. 

The following result establishes that G is totally convex on bounded sets in f"'(K) and gives an 
explicit lower bound for the relative modulus of total convexity. 

Lemma 3.1 Suppose that Assumption 2.1 is in force. Let p let uq G £^(K) be such that ||uo||r ^ 

p, let G dG{uo), and set M = (7/32)r(r - 1)(1 - (2/3)^-^). Then 

{'^UG r{K)) G{u) - G(no) ^ {u - uo \ u*^) + ~ (3.1) 

{p+ \\u - -Uollr) 

Proof. Let G\ be the restriction of G to f^(K), endowed with the norm H-H^. Since uq G £'"(K) 
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and ttQ G we have that Uq G dG\{uo)- Let -0 be the modulus of total convexity of G\ and 

let (f be the modulus of total convexity of H-HJ^ in £'’(]K). Then, for every u G f''(]K), G{u) — G{uo) ^ 
{u — uo,Uq)+'iP{uo;\\u — uoW^). Moreover, since G| = iL+T/||-||^, with Tf G ro(^'’(]K)) (see Lemma A. 1), 
we have V' ^ W- The statement follows from [11, Proposition A. 9-Remark A. 10]. □ 

The next proposition revisits some results of [2] about Tikhonov-like regularization specialized to 
our setting. 

Proposition 3.2 Suppose that Assumption 2.1 is in force. For every (A,e) G let he an e- 

minimizer of F + \G and let uq he the minimizer of G. Then the following hold: 


(i) inf R{C) = inf F(dom G). 

(ii) (V(A,e) G ||ma,6 - ug\\^ < max {ll'UGlIr, (2{F{ug) + e)/(r/MA)) 

(hi) F{uyfj —^ infF(domG) as (A,e) ^ (O+jO’*'). 

(iv) Suppose that S = Argmin^j^j^ g F f 0 and let e: M++ —)• [0,1] he such that e(A)/A —)• O’*' as 
A —0. Then there exists G f''(]K) such that S = {u'^} and r(A,e(A) as X ^ O’*". 


Proof. We first note that it follows from Remark 2.2(ii) that G has a minimizer. 

(i): Let u = {pk)k&L G n X k^w^k and take <5 G M++. Then there exists a finite set Ki c K 
such that Y^kemcKi v = {uk)kGK he such that, for every k G Ki, Uk = pk and, for 

every k G KxKi, Uk = 0. We have v G domG and \\Au — Ar ;||^2 < ||^||h- Thus, C = A(domG) and 
the statement follows. 


(ii): Let (A,e) G K++- We derive from the definition of rtA,e, that F{uye) + AG(riA,e) ^ F{ug) + 
\G{ug) -f e hence, since 0 G dG{uG), it follows from Lemma 3.1 that 


pM\\ux^e - UG\\i 
{\\uG\\r + \\ux,e - UG\\^f~'' 


^ G{ux,e) - G{ug) ^ 


F{ug) + e 
A 


(3.2) 


If ||ttA,e - ugW^ ^ \WG\\r> then 

vM\\ux,e - ugWI y vM\\ux,^ - ugWI > 

{\\uG\\r + \\u\e - UG\\y~'' ( 2 ||ua ,6 “ ^ 

and hence ||riA,6 - ug\\1 ^ 2{F{ug) + e)/fqM\). 

(hi): Let u G domG. Then, for every (A, e) G IR++, 


(3.3) 


infF(domG) ^ F{ux^e) 


^ F{ux,e) + \{G{ux,e) - G{ug)) 
^ F{u) + \{G{u) - G{ug)) + e. 


(3.4) 
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Hence 


mfF(domG) ^ lim F{ux^^) 

(A,e)—>-(0,0) 

^ lim F{u\f) 

(A,e)^{0,0) 


^ lim iF{u) + \{G{u)-G{uG))+e) 
^ F{u). 


(3.5) 


Since u is arbitrary, the statement follows. 

(iv): Since S is convex and G G ro(^^(]K)) is strictly convex, coercive, and domG C F(]IC), it 
follows from [3, Corollary 11.15(ii)] that there exists G f''(lK) such that S = {rt^}. Moreover, 

G(^a,.(a)) ^ - ^KdA)) + e(A))/A + G(ut) ^ G(ut) + e(A)/A, (3.6) 

which implies that (G'(rt;i, £(;)^)))Ae]R++ is bounded. Since G is coercive, the family (rfA,£(A))AeK++ is 
bounded as well. We deduce from [33, Proposition 3.6.5] (see also [6]) that there exists an increasing 
function cj) '■ I^++ —S' 1^+ such that cj){0) = 0, for every t G M++, > 0, and 


(VA G M++) 0 


/ II^A,£(A) 


— tti| 






G{u^) + G*(ttA,£(A)) _ ^fU\^e{X) + 


(3.7) 


Hence, arguing as in [8, Proof of Proposition 3.1(vi)], we obtain ua —s- as A —)• O’*". □ 

Next, we give a representer and stability theorem which generalizes existing results [17, 29] to 
our class of regularization functions. 


Theorem 3.3 Suppose that Assumption 2.1 is in force. Set M = (7/32)r(r — 1)(1 — (2/3)^ ^), let 
A G M++, and let u\ G F(]K) he the minimizer of F + XG. Then the following hold: 


(i) The function 

T>x:Xxy^f{K): {x,y) ^ 2{fu,{x) - y)^x) 


(3.8) 


is bounded and ||T'a||oo ^ 2K(K||rtA|l2 + ^)- Moreover ||'I'a ||2 ^ tind —Ep('I'a) G 

XdG{ux). 


(ii) Let n G N*. Then there exists v G F(K1) such that ||1; — Un,\{zn)\\r ^ a/^(A) 


pM\\v — uaI 


^Allr + l|i^- i^aII,.) 


2-r 


< 


n 


'^T!x{xi,yi)-^pfl’x) + \/£ 


2=1 


(3.9) 


Proof, (i): First note that [3, Corollary 11.15(ii)] asserts that ux is well defined, since F + XG is 
proper and lower semicontinuous, and, by Remark 2.2(ii), strictly convex and coercive. Furthermore 
[3, Corollary 26.3(vii)] implies that —VF{ux) G XdG{ux). We derive from (2.3) that ^4*: LP‘{Px) —^ 
f^(]K): / !-)• Ep^(/4’), and hence, since F = Ro A, 

(Vu G f2(]K)) VF{u) = A*VR{fu) = Ep{p), (3.10) 
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where y?: {x,y) i-)- 2{fu{x) — y)^{x)). Let {x,y) e X x y. Then 


\fuxix) -y\^ \fux{x)\ + \y\ ^ X] I ^k)\\4>k{x)\ + 5 ^ k||^^aII 2 + b (3.11) 

keK 

and hence ||T'a(x, y )||2 ^ 2|/„Jx) - y|||^>(x )||2 ^ 2{k\\ux\\ 2 +b)K. Moreover, 

/ \\'^\{x,y)\\ldPix,y) ^ [ {2K\fuxix)-y\)'^dP{x,y) = 4K^R{fux)- (3.12) 

Jxxy Jxxy 

(ii): Let £^(]K) ^ ]R_|_: u i->- (l/'n)X]r=i \fu{xi) — yi\‘^. Since the restriction of G to P(K) is in 
ro(f'’(]K)) by Lemma A.1, Ekeland’s variational principle [3, Theorem 1.45] implies that there exists 
V E f^(]K) such that \\un^\izn) — v\\r ^ A/e(A) and inf ||(9(F,i + AG)(i;)||r* ^ Y^e(A). Using the inequality 
^ 2{a — b)b, we derive from definitions (3.8) and (2.4) that, for every i E {1,... , n}. 


'^{v-ux\ ek){yx{xi,yi) | e^) = ^ (i; - ua | ek)2{fux{xi) - yi)4'k{xi) 
fcgK fcgK 

= ‘^{fviXi) - fux{Xi))ifuxiXi) - yi) 

^ {yi - h{xi)f - {yi - fux{xi)f 

and, summing over i and dividing by n, we obtain 

Fn{v) - Fn{ux) -Ux \ efc)(- y], , ^x{xi,yi) I Cfc). 


Lemma 3.1 and (i) yield 

\G(A) - XG{ux) ^ {v -ux 


-Ep(T'A)) + Ar?M 



\u\\\r + II?' - «aIU 


2-r ■ 


(3.13) 


(3.14) 


(3.15) 


Next, since inf \\d{Fn + AG)(i;)||r* < ^/^(A), there exists e* E f'’*(K) such that ||e*||r* < ^/^(A) and 
{ux — u I e*) ^ {Fn + XG){ux) — {Fn + XG){v). Summing inequalities (3.14) and (3.15), we have 


a/^(A)II^’ ~ U\\\r ^ {Fn + XG){v) — {Fn + XG){ux) 

1 ^ 

>'^{v-ux\ ek)(^-'^yx{xi,yi) - Ep(T'a) 


fcgK 


+ 


Xr]M\\v — Ux 


2 = 1 
2 




kAllr + 11^- ^aIU 


2-r ■ 


(3.16) 


Hence, using Holder’s inequality, 

Ar/M||i; - maII^ 


kAllr + 11^- ^aII^) 


2-r 


^ ||u — ^aI 


(ll^ IZili '^Ax^,yi) - Ep('kA)|[^ + 


(3.17) 


and the statement follows from the fact that H-Hj,* ^ |Ml 2 - D 

We recall the following concentration inequality in Hilbert spaces [32] and give the proof of the 
main result of this section. 
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Lemma 3.4 (Bernstein’s inequality) Let (C/j)is:is:n be a finite sequence of Ltd. random variables on a 
probability space (0,21, P) and taking values in a real separable Hilbert space “K. Let P > 0, let a > 0 
and suppose that maxi^i^n ||t^|| ^ P and that Ep||i7j||^ < Then for every r > 0 and every integer 
n ^ 1 


P 


n 

- y^{Ui - ^pUi) 



4/3t 

3n 




(3.18) 


Proof, [of Proposition 2.3] (i): For every / E C, R{f) = ||/—/^||^2 + inf i?(L^(P;t')). Therefore, 
minimizing R over C turns to find the element of C which is nearest to p in Lf{Px). 

(ii): It follows from (i), that inf R{C) = ||/c — /^||i 2 + inf ii(L^(P;t^)). Therefore, since for every 
/ G c, (/ - /c I /t - fc) ^ 0, we have R{f) - inf i?(C) = ||/ - p\\l, - \\fc - f^fp 2 = ||/ - fcWl^ + 
2(/-/c|/c-/n>||/-/c|li2. 

(hi) : Let f £C. Using the fact that, for every (a, b, c) € with a ^b, y/a + c— y/b + c ^ y/a — Vb, 
we derive that 

/TO - x/h^TO ^ 11/ - /^IIl 2 - ll/c - Ph 2 ^ 11/ - /cIL 2 . (3.19) 

Therefore, using the inequality ^ 2a(a — b), we obtain 

Rif )-inf RiC) ^ 2/TO 11/ - fch 2 

= 2^(11/ -/t||i 2 + inf TOW)) 11/ - fciy 

^ 2 ((||/ - fch 2 + ll/c - Piyf + inf TO W)) - fc\\L 2 

= 2((||/ - /cIL 2 + y/infci?-TO(p^)i?)' + mfRiL\X))) '^'\\f - /dl^^ (3.20) 

□ 


Proof, [of Theorem 2.4] (i): Let n € N*, let Zn = ixi,yi)i^i^n E (A:’ x 3^)" and let Fn'. u G 
M_|_: u !-)• {l./n)'f2^^.^\fuixi) — yjp. Let uq E ArgminG, let A E ]R+_|_, and let 

PA = max{l, IIugIU ( 2(fe + Kldcll + l)^/(h^A))^^'’}. Since F(mg) ^ (b + n\\uG\\)‘^ and/( ug) ^ 
(6 + kIIugII)^, from the definition of px and Proposition 3.2(ii) we derive that ||tiA — ug\\j. ^ px and 
\\un,xizn) — ugWj. ^ Px ■ It follows from Theorem 3.3 that there exist T'a: x y —)■ f^(]K) and 

V E f''(]K) such that ||1; — Un,xizn)\\ ^ /(A) and 


Mp\\v — ma| 




1 


(IkAll,. + Wv-uxW^)"^ ^ " A 

Therefore 


( - Ep(T'a) ^ + A/e(A))- 

^ i=i ^ 


1^- ''^aL ^ 




(3.21) 


(3.22) 


Thus, 


nn,xiZn) ^aII,, ^ ^£(A) -p 


(4pa)^-^ 

MpX 





(3.23) 
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Now, consider the i.i.d. random vectors ^\{Xi^Yi)-. Vl —)• £^(K), for 1 ^ i ^ n. It follows from 
Theorem 3.3(i) that ^ ‘^k{kp\ + b) and that maxi,gj,g„ Ep||T'A(-^i, ^ 

AK?R{fu^). Now set /3 a = 2k{kpx + b) and = K^R{fuy^)- Then Bernstein’s inequality in Hilbert 
spaces (Lemma 3.4) gives 


(VrEM++) P[ E(T'A(X,y))--V'TA(X„y,) ^5(n,A,r) 
L n 2 


^ 1 - e-^ 


(3.24) 


2=1 


where 5{n,X,T) = 2a\lxpn + ‘iax^/rln + 4/3Ar/(3n). Thus, recalling (3.23) we have 


\u. 


i,\{Zn) — > \/ e(A) + ^ + V£(-^)) 


^ e-\ 


(3.25) 


Set 7 o = 2{b + K||ttG|| + 1)^ and 71 = 4^ ^/{rjMy/^. We note that, since cta is bounded, say by 

72 , for A < 1 sufficiently small, we have 


(4pa) 


2-r 


Mr]X 

42-r 


((5(n, A,r) + v/e(A)) 
-1 


70 


1 f2ax 


+ 4crAW- + 


r 4/3Ar 


n 3n 




Mrj \r]M J }?R y yTr 
< ^ + 8rK^7o/(^M)V’- 8rK5 

"'^^VA2AnV2^ ’^X2lrni/2^ ^nX^/r ^ 3nA2A A2A y’ 

Therefore, since lj{x‘f/^n^^'^) —> 0 and y^£{X ^/^ 0 it follows that 

(4paJ2-^ 


(3.26) 


A. 


■ ^<5(n, Xn,T) + a/ £(An)^ —^ 0 


(3.27) 


and hence, in view of (3.25), we get ||rt„^A„(-^n) — ^^Anll,. —> 0 in probability. Moreover, using Proposi¬ 
tion 2.3 (ii), 

Wfn-fch^ ^ \\AUn,X„{Zn) - AuxJ^ 2 + \\Aux^ -fch^ 

^ \\M\\\un,\AZn) - UA„||,, + \/F{ux„) “ inf F(domG). (3.28) 

Since F{ux„) — inf F(domG) —)• 0 by Proposition 3.2(iii), and \\un,\iZn) — ux„ Hr —^ 0 in probability, 
we derive that \\fn — fcWi'^ —> 0 in probability. 

(ii) : Let n G N*, let r] G M++, and set 


Xln,ri = [wfn- /c|Il 2 > \\A\\r] + x/F{ux^) - inf F(dom G )|. 


(3.29) 


Since e{Xn) = 0(l/n), it follows from (3.26) that there exists 73 G M++ such that, for every r G 
]1, +00 [, and every n G N*, 


(4pA„) ^ , /ZTVTt / Tsr 

r/MA. ^ ' 

Let C G ]1, +oo[. There exists n G N*, such that, for every integer n ^n, 

73 / 73 ^ log n 

^ ^ p. 




(3.30) 


(3.31) 


14 




































Therefore, it follows from (3.25), (3.28), (3.30), and (3.31) that, for n large enough. 


^ exp 



^ exp (—^ logn) = n 


(3.32) 


Thus, Ylin=n ^^n,r] < +00 and we derive from the Borel-Cantelli lemma that P (nA:>n Un>fc ^n,r]) = 0. 


Recalling Proposition 3.2(hi), we conclude that the sequence \\ fn — IcWl^ —^0 P-a.s. 


(hi): First note that Proposition 3.2(hi) implies that is well defined and that p = 
supAgR++ I^aII < +00- Now, let A G M++ and let n G N*. Since \\u\\\ ^ p, arguing as in the 
proof of (i), we obtain 



(3.33) 


where a = 2k{kp + b) and 5(n, r) = Aajy/n + Aa^frjn + j (3n). 

(hi)(a): Since l/(A„n^/^) —)• 0, we have (1/An)(5(n,r) —)• 0 and hence in view of (3.33), 



the statement follows by Proposition 3.2(iv). 

(iii) (b) : The proof follows the same line as that of (ii) . □ 

4 Algorithm 

The goal of this section is to prove Theorem 2.7 and Theorem 2.11. The proof of Theorem 2.7 is 
based on the following fact. 

Lemma 4.1 [30, Lemma 4.1] let Hhe a real Hilbert space, let (3 G M++, and let 5 G M+. Let F: TL ^ 
]—(X), +oo] be a convex differentiable function with fi-Lipschitz continuous gradient, and let G G ro(7f ). 
Then, for every {u, v, w) G TL^ and every v* G dsG{v), 


{F + G)iv) ^ (F + G){u) + {v-u \ VF{w) + v*) + ^\\v- 



(4.1) 


Proof, [of Theorem 2.7] Let m G N and set 


Vm = prOX.^^c'l^m - 7m(VF(Um) + &m))- 

Since 


(4.2) 


Vm G \\w - {um 




(4.3) 


using the strong convexity of the objective function in (4.3), we get 




'm • 


(4.4) 


Therefore, setting Um = Vm — Vm, we have 

Um+l — '^m T Tn( PrOX.^^Q(Um T bm)) T 


(4.5) 
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Hence (2.15) is an instance of the inexact forward-backward algorithm studied in [12] and we can 
therefore use the results of [12, Theorem 3.4]. 

(i)-(ii): The statements follow from [12, Theorem 3.4(i)-(ii)]. 

(hi) : We have 


^ 211 UiYi Vm II T 211 CljYi 11 

^ 4||ttm — ~ 7mVi^(t6m)) II + 4||6 m IP + 2|| 


(4.6) 


Therefore, the statement follows from [12, Theorem 3.4(iii)]. 

(iv): By (4.3) and [27, Lemma 1], there exist G [0,+oo[, J 2 ,m € [0,+oo[, and with 

+ ^im ^ and ||em|| ^ S 2 ,m such that 


- {VF{um) + bm) + ^G 

im 1 m 


Now set J = F + G. It follows from Lemma 4.1 that, for every u &T-L, 


(4.7) 


JiVm) - J{u) 


/3|I_ _ ||2 I 


^ (Vra U I VT'('Um) T 4” o ll'^m "^^mH T „ 

1 Z'ji 

— \Vm U \ Um Vm) + „ 11 'am Um\\ 

7m 2 


I 1 / I h \ \ ^t,m 

H- {Vm-U \ Cm- ImOm) + TT^ 

7m 


27 , 


1 71 ||2 II ||2t^l/« ^ ,n 

<"Ujn - rt|| - ll^m -u\\ j + - p-) ll'am “ Wm| 

' 7m 


m 


(4.8) 


I ^ I „ J. \ I ^t,m 

T {Vm ai I Cm ^mbm) T „ 

7m ^1m 

We derive from (i) and (hi) that {{vm — u\um — Vm))mm square summable. Therefore, if we let 
u G Argmin J, it follows from (4.8) that {J{vm) — inf J{Ti))mm is square summable. Now, if we let 
u = Umtn (4.8) we have 


\ T( Mil l|2^ 1 Pll INI A II 

J\Vm) J\Fmj ^ I „ I 'l^m Vm\\ + I \\Um Cm Tm^m T „ 

, 2 7m / 7m V 2 


^ ( \\Um IIIICm 7mi*m|| T 

7m 


^l,m 


(4.9) 


Set 7 = infmeN 7m- Since Um+i = Um + Tm{vm - Um), using the convexity of J and (4.9), we get 


'4('Um+l) VC\1 J{T~Cj ^ J{Ura) iuf «/(7f ) + Tm(«/('l’m) 

^ J{Um) - infj{n) +J_~^{\\Um - 'amllllem - 7m^m|| + <iym/^) ' (4-10) 

Thus, since (||um — 'I’mll ||em — 7m^m|| + <ii,m/^)meN summable, [26, Lemma 2.2.2], ensures that 
{J{um) — inf J{T-L))mm converges. In view of the inequalities in (4.10) that its limit must be 0. 
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(v): Let u E Argmin J. Since, Um — u = {I — Tm-i){um-i — — u), it follows from 

the convexity of H-H^ that 

ll’^m ^11 ll^m ^11 ^ (f An—l) 1 “f ||'am—1 '^|| W'^m ^|| • (4.11) 

Therefore, it follows from (4.8) that 


0 ^ J{Vm) - J{u) 

1 ^m—1 


^ - ■ -lit ,— _L II ||2| f/ll ||2 II 112^ 

27 27 ^ 


1 


1 




I '^m|| H“ ( W'^m ^llll^m 'T^n^mll “1“ 
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(4.12) 


Hence, (J(tim) — inf J(^))^gj^ is summable, for each term on the right hand side of (4.12) is 
summable. Since Um+i = (1 — 'Tm)um + TmVm, convexity of J yields 


0 ^ J{um+i) - inf J(^) ^ (1 - Tm){J{um) “ inf JCH)) + Tm{J{Vm) “ inf J(?f)). (4.13) 

The summability of (1 - An)mgN and {J{vm) - inf implies that of {J{um) - 

(vi): Since [j{um) — inf >^(^))meN summable, it follows from (4.10) and [15, Lemma 3] that 
[J{um) — inf J{T-L)) = o(l/m). □ 

The purpose of the rest of the section is to show how approximations of the type considered in 
Theorem 2.7 (equation (2.15)) can be computed explicitly. 


Lemma 4.2 Let /i: M —^ M be convex and such that 0 E Argmin^ h, let {s, p) E and let a E [—1,1]. 
Let 13 E M+ be the Lipschitz constant ofh in [—1 — |//|, 1 + |//|] and set 

5 = \l (2/3 + 2|/i| + 1 )|q;| and s = prox^ p, + a. (4.14) 


Then s prox/j p. Moreover, S' = sign(p) max{0, sign(p)s} satisfies S' prox/j p and pis ^ 0. 


Proof. Let t = prox^ p. Since 0 E Argmin h, prox^j 0 = 0. Hence, since prox^j is nonexpansive and 
increasing [12, Lemma 2.4], \t\ ^ |p| and sign(t) = sign(p). 'We note that |s| ^ |s — t| + |t| ^ 1 + \p\. 
Thus, 


1 9 1 9 1 

Hs) + 2 I® “ h\ - h{t) - -\t - p\ ^ /3|s - t| + -|s - t||s - /i + t - p\ 

^ —(2/3 + 1 + 2|/i|)|a|. (4.15) 

To conclude, it is enough to note that |? — prox;j(/r)| ^ |a|. □ 

Lemma 4.3 Let h E ro(M), let a E ro(M) be a support function, and set f = h + cr. Let {s,x) E be 
such that sprox^(x) ^ 0, and let 6 E M+. Then 

s ~5 prox;j(prox^ x) => s~5prox^x. (4.16) 
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Proof. Let n = prox^-x and s prox;j(prox^ x). By [27, Lemma 2.4] there exist (^ 1 ,(^ 2 ) £ 1^+ and 
e G M, such that 


/i — s + e G 552 / 2 / 1 ( 5 ), |e| ^ (^ 2 , and + 


(4.17) 


Hence 


X — s + e = x — /r + /i — s + eG da{fi) + 5^2 


(4.18) 


Since s/i ^ 0, there exists t G M+ such that fj, = ts. Moreover, since a is positively homogeneous, 
da{ts) C da{s). Therefore x — s + e G da{s) + d ^2 / 2 h(s) C 5^2 i 2 f{s), which implies that s ~5 prox^ x 
by [27, Lemma 2.4]. □ 

Remark 4.4 Let h G ro(M), let (s,/i) G M^, and let 6 G M_|_+. Suppose that 0 G Argmingh and that 
s prox/j // with (5 ^ |s|. Then ^ 0. Indeed, since h(0) = inf h(M.), we have 


h(s) + ^\s - ^ h(0) + ^ h{s) + 


(4.19) 


and hence 0 ^ (1/2)(s^ — 6^) ^ s/i. This shows that Lemma 4.3, when 5 = 0, gives prox^ = 
prox/joprox^ and consequently generalizes [9, Proposition 3.6], relaxing also the condition on the 
differentiability of h at 0. With the help of this result one can compute general thresholders operators 
as the proximity operator of I / + 7/|-|'’. Figure 1 depicts some instances of these thresholders (see also 
[9]). 

The following lemma is an error-tolerant version of [10, Proposition 12]. 

Lemma 4.5 Let <f G ro(M), let {s, x,p) G M^, let 5 G M+, and let C cRbe a nonempty closed interval. 
Then 

s prox^x, and p = proJc. s ^ p ~5 prox^_^,^ x. (4.20) 

Proof. Let g = 4>+ (l/2)(- — x)^ and let e = (5^/2). Since g is convex and s = prox^x is its minimum, 
g is decreasing on ] — oo,s] and increasing on [s,+oo[. By definition s is a e-minimizer of g. The 
statement is equivalent to the fact that p is a e-minimizer of p + tc”. If s G C, then p is a fortiori an 
e-minimizer of p + lc- We now consider two cases. First suppose that s < inf C. If s < inf C ^ s, 
then inf C is still an e-minimizer of p and inf C & C. Thus p = inf C is an e-minimizer of p + lc- If 
either s ^ s ^ inf C or s ^ s < inf C, we have p = proJc s = inf C, which is the minimum of p + lc, 
since p is increasing on [s, +00 [. The second case supC < s is treated likewise. □ 

Proposition 4.6 Let TLbe a separable real Hilbert space and let {ok)k£K be an orthonormal basis ofTL, 
where K is an at most countable set. Let {hk)keK be a family of convex functions from M to M such that, 
for every k ^ K, hk ^ hk{0) = 0. Let {Ck)keK be a family of closed intervals in M such that 0 G flfesK 
let {Dk)k£K be a family of nonempty closed bounded intervals in M. Suppose that (/i^(—(inf Dk)+))k£K 
and {hl{{sup Dk)-))keK are summable, and set 



(4.21) 
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Let w £% let {ak)k&K G let {^k)k€K G set 6 = y/Y.kGK^k, and let 

for every /c G K 
Xk = {w\ Ok) 

^ 47max{/ifc(|xfc| +2),hk{-\xk\ -2))} + 2|xfc| + 1 (4.22) 

TTfc = prox.^;,^ ( soft^Dfc Xk) + Ofc 
_ ivk = proJc^ (sign(xfc)max{ 0 ,sign(xfc)vrfc}). 

Now set V = XlfceK ’^kOk- Then v prox^g OJ- 


Proof. The function G lies in ro(^) as the composition of the linear isometry PL £^(K): u 
{{u I Ok))km and the function 

f{K)m]-oo,+oo]:{fkk)km'-^'^9kihk), with Qk = i^Ck + aOk + h, (4.23) 

fceK 

which belongs to ro(^^(]K)) by Lemma A.l. Now set 


(VA; G K) 


hk — soft-^rjj. Xk 

< Sk = sign(;Ufc) max {O, sign(/Xfc)(prox.^;j^ ^k + afc)} 
.^k = proJcfc Sk- 


(4.24) 


Let k £ K. Since soft.yi 5 j. is nonexpansive and 27max{/ifc(|xfc| + ‘2),hk{—\xk\ — 2)} is a Lipschitz 
constant for yhk on the interval [—Ixfcl — 1, \Xk\ + 1]> h follows from (4.24) and Lemma 4.2 that 

6l = (47max{/ifc(|xfc| + 2),hk{-\xk\ - 2)} + 2|xfc| + l)|afc| 

< Sk prox..^;,Jprox^^^^ Xfc) (4.25) 

Sfcprox.^^^^ Xk ^ 0. 

Thus, Lemma 4.3 yields 


Sk prox^(;,^+,^^) Xfc, 

and, using Lemma 4.5, we obtain Uk ^4 prox^^^ Xfc- Hence, by Definition 2.6, 


Idki^k) + TtWk - Xfcl^ ^ 75fc(prox.^gfe Xk) + ^Iprox^ 


,2 ^1 

Xfc-Xfcl +y- 


2rn. ^ - ' 2 ' '>'9'= 

On the other hand, we derive from [12, Example 2.19] and [9, Proposition 3.6] that 


(4.26) 


(4.27) 


(prox.^G w\ok) = prox^g^ Xfc- (4.28) 

Thus, summing the inequalities (4.27) over k, we obtain 

7 9k{nk) + ^ X] 
fceK fceK 

[gkiipmx^gW I Ok)) + ^|(prox^G''»^ I Ok) - {w \ Ofc)|^) +\Y^k 
keK km. 

^ yG{ prox^c, w) + ^ Uprox^g, w -wf + ^J2^k 

^ km 

< + 00 . (4.29) 
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Thus, (A.5) and (4.29) yield {vk)k&N £ and one can find v ^ % such that, for every /c G K, 

Xk = Vk- Hence, 


lG{v) + ]^\\v - w\\^ ^ 7G(prox.^c'+ ^llprox.^G ^ ^ (4.30) 

^ fceK 

and finally v prox.^c w, where 6 = v^EfceK ^k- □ 

Proof, [of Theorem 2.11] (i); Lemma A.l guarantees that G € ro(^^(lK)), that G is coercive, and that 
domG C £''(]K). The statement therefore follows from [3, Corollary 11.15(ii)]. 

(ii)-(iv): Let Fn! ^ M: u —>• (1/n) ^i=i{Au{xi) - yi)^. Then, for every u G ^^(K), VFn(tt) = 
(2/n) I 4>(xj)) — yi)4>(xj). Hence, since ||4>(xj)||2 ^ n, VFn is Lipschitz continuous with 

constant 2 k^. Therefore, the statement follows from Theorem 2.7 and Proposition 4.6. It remains to 
show the convergence properties of {\\um — (ll^m — ^^llr)meN' focus on the sequence 

(||n„, — since {\\vm — f'^|lr)meN treated analogously. It follows from Lemma 3.1 and 

the convexity of Fn that 

(VmGN) {Fn + \G){um)-{Fn + \G){u)^ y\M\\u^-u\l 

(ll'^llr + \Wm - U\\^) 

Therefore, since {Fn + XG){um) — {Fn + XG){u) —^ 0 as m ^ +oo and ip: M+ ^ M: 1t^/(||tt|| +t)^“^ 
is strictly increasing with ip{0) = 0, we obtain \\um — u\\^ —)■ 0. Moreover, taking p G M++ such that 

SUPmeN {IMr + II Um llr)^ ^ (2.18) follows from (4.31). □ 
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A An auxiliary result 

The following result is a generalization of [12, Proposition 5.14]. 

Lemma A.1 Let K be an at most countable set. For every k CK, let Ck be a closed interval in M such that 
0 E Ck, let Dk be a nonempty closed bounded interval in M, and let hk E Tjj'(M) be such that hk{0) = 0. 
Set 

G: i^{K) ^]—oo,+oo] : {(k)k£K'-^where Qk = t'Ck + ^+ hk. (A.l) 

fcGK 

Let r E ]1,2] and consider the following statements: 

(a) EfcGKlM-Ofc)+P < +00 and I (sup Zlfc)_|2 < +oo. 

(b) < +ooand^fcgK^fc((sup-Dfc)-) < +oo. 

W EfcgKlM-Dfc)+r* < +00 and X;fegKl(sup-Dfc)-r* < +oo. 

Then the following hold: 
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(i) Suppose that (a) or (b) is satisfied. Then G G ro(^^(]K)). 

(ii) Suppose that (b) is satisfied. Then inf G(£^(]K)) > — oo. 

(iii) Suppose that, for every k G K, /i^ ^ p\-f for some p G M++. Then (a)^(c)^(b). 

(iv) Suppose that, for every k ^ K, hk — p\-f ^ Tq (M) for some p GM++ and that (c) holds. Then, for 
every p' G ]0,p[, there exists H G ro(^^(]K)) such that G: u H{u) + p' Y^keK\hkf> domG C 

and G is coercive in 


Proof. We first observe that, if there exist (xfc)fcgK £ and b G M+ such that 

(VA: G K) - Pk ^ Xk + b\-\^ , 
then G G 

(i): Let /c G K. Since 


(V/tGM) anfip) 


p, sup Dk if /i ^ 0 
p inf Dk if p < 0, 


we have 


(A.2) 


(A.3) 


(V/r G M) - Qkip) ^ -CFDfip) - hk{p) 

^ max{(//)_(inf Dk)+, {p)+{sup Dk)-} - hk{p). (A.4) 

Hence, in order to guarantee condition (A.2) for some {xk)k£K ^ ^+(^) b G M++, it is sufficient to 
require condition (a) or (b) (note that h}, ^ 0, since hk{0) = 0). Therefore in this case G G ro(f^(]K)). 

(ii) : It follows from (A.4) that 

(VAgK) - Pk ^ max {hl{- {inf Dk)+),h*k{{supDk)-)}. (A.5) 

Hence, for every u G ^^(K), -G{u) < ^^gj^max {/i^( - (inf 1?^)+),/i^((supD^)-)} < +oo. 

(iii) : For every k £ K, h}. ^ {rp)^~^* . The statement therefore follows by observing 
that, since 2 ^ r*, ^^(K) C A’’*(K). 

(iv) : Setting, for every k £ K, hk = hk - p'\-\'^, we have pk = iCk + + hk + p'\-\^ , with 

{p — p')\-f ^ hk £ r^(R). It follows from (i) and (iii) that, for every u — iPk)k&K £ G{u) — 

H{u) + p' ^kGK\hk\\ for some H £ Toif^iK)). □ 


B Proximity operators of power functions 

It follows from [7, Example 4.4] that, for every 7 G M++ and every r £ [1,2], 

{\/p£M.) prox.^|.|r-/i = ^sign(/r), where ^^0 and ^ = \p\. (B.l) 

There are several exponents r for which Equation (B.l) can be solved explicitly for r G {3/2,4/3,5/4} 
[7, 31]. However, in general, it must be solved iteratively. 
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Proposition B.l Let fj, £ R, let j £ K++, let r £ [1,2]^ and let (ri,r 2 ) G [1,2]^, be such that ri < r 2 . 
Then the following hold: 


(i) prox^|.|r: M —)■ M is strictly increasing, nonexpansive, odd, and differentiable, and prox^|.|r +tK_,_ is 
convex. 


(ii) We have 


mm 


r \h\ 

/ 1^1 \ r-l 'I 

1 1 + ry ’ 

V1 + ry/ j 




|prox^|.|r ^1 ^ max 


r \h\ , 

^ \p\ \ >--1 1 

1 + ry ’ ' 

\1 + ry) j 


(B.2) 


(iii) Suppose that |/i| > 1 + r 2 y. Then |prox^|.|r 2 p\ < |prox^|.|ri p\. 


(iv) Suppose that r > 1 and that \p,\ > 1 + ry. Then 


\h\ 

1 + ry 


^ Iprox^l-r h\ < \h\ - 7- 


Proof, (i): It follows from [9, Lemma 2.2(iv) and Proposition 2.4] that prox,-|.| is nonexpansive, 
increasing, and odd. Now set fj: M+ —M_|_: ^ i-)- .^ + rT^'’~^. Clearly f: is strictly increasing and 
concave. Moreover it is differentiable on 1R++ and, for every ^ € M++, V’^(0 = 1 + r(r — 1)/^^“''. 
Hence, from (B.l), for every p £ 1R+, pioxp.^r p = fj~t(^pf This shows that prox,-|.|r is strictly 
increasing, convex, differentiable on M++ with, for every p £ M++, (prox.,-|.|r)'|U = 1 /fi'{p)), that 
is 


(prox.^|.|r)'^ 



r(r — 1)7 
(prox^|.|r pf-^ 


-1 


(B.3) 


(ii) : According to (B.l), there exists ^ G M+ such that prox.,-|.|r p = sign(/i)^ and ^ ^ = |^|. 

If ^ ^ 1, then |/r| = C + ^ (1 + rr)^, hence \p\/{l + rr) ^ C = |pi’ox.,.|.|r p\. If ^ < 1, then 

\p\ =i + rT^'^~^ ^ (1 + rr)^^“^, hence (|//|/(1 + = Iprox^i.jr |u|. The first inequality in 

(B.2) follows and the second is proved analogously. 

(iii) : In view of (B.l) there exist ^1 G M+ and ^2 G M+ such that 

f prox^|.|n p = sign(^)^i and + riTSff~^ = \p\ 

\pi’ox^|T 2 p = sign{p )^2 and ^2 + r 2 T(,lf~^ = \h\- 

If |//| > 1 + rr 2 > 1 + rri, it follows from (B.2) that 

and l<^^^|e|. (B.5) 

1 + riT 1 + r2T 

Therefore, since ri < r 2 and > 1, 

+ r2T^2'~^ = \p\ = 6 + riTCf~^ < 6 + r2TCf~^- (B.6) 

Hence, since ^ .^ + r 2 T^'^^~^ is strictly increasing on M+, we conclude that ^2 < Ci- 

(iv) : Since (B.l) implies that prox,.|.| p = sign(/i)(|/i| — r), we derive from (iii) that 

|;u|>l + rT ^ |prox,_|.|r< |/i| - r, (B.7) 

The first inequality in (iv) follows directly from (B.2). □ 
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Remark B.2 


(i) The bounds given in (B.2) can be useful to initialize the bisection method to solve (B.l). 

(ii) (prox.^|.|r)'0 = 0, (prox.^|.|r)'^ ^ 1 and (prox.^|.|r)'// —)■ 1 as /i — +oo. 

(iii) prox.^|.|r has no asymptote as fi ^ +oo, since (B.l) yields prox.^|.|i- fj, — n = — r 7 (prox.^|.|r //)^“^ —)■ 
—oo as /X —)• + 00 . 
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